AITopics | global learning rate

Collaborating Authors

global learning rate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

e1054bf2d703bca1e8fe101d3ac5efcd-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-14-2026, 16:53:09 GMT

equation, statistics, variance, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.34)

Add feedback

e1054bf2d703bca1e8fe101d3ac5efcd-AuthorFeedback.pdf

Neural Information Processing SystemsAug-20-2025, 06:18:57 GMT

We thank the reviewers for their time, effort, and helpful feedback. We address individual comments below. The training loss in all of our examples can be written in the form of lines 17-18. We will include these additional references and add a proof to the appendix. Consider equation (6) when the assumptions of Section 2.1 apply, i.e., Our test can be considered as a more general version of testing whether the loss has reached a constant value.

equation, statistics, variance, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.34)

Add feedback

Cumulative Learning Rate Adaptation: Revisiting Path-Based Schedules for SGD and Adam

Atamna, Asma, Maus, Tom, Kievelitz, Fabian, Glasmachers, Tobias

arXiv.org Artificial IntelligenceAug-8-2025

The learning rate is a crucial hyperparameter in deep learning, with its ideal value depending on the problem and potentially changing during training. In this paper, we investigate the practical utility of adaptive learning rate mechanisms that adjust step sizes dynamically in response to the loss landscape. We revisit a cumulative path-based adaptation scheme proposed in 2017, which adjusts the learning rate based on the discrepancy between the observed path length, computed as a time-discounted sum of normalized gradient steps, and the expected length of a random walk. While the original approach offers a compelling intuition, we show that its adaptation mechanism for Adam is conceptually inconsistent due to the optimizer's internal preconditioning. We propose a corrected variant that better reflects Adam's update dynamics. To assess the practical value of online learning rate adaptation, we benchmark SGD and Adam, with and without cumulative adaptation, and compare them to a recent alternative method. Our results aim to clarify when and why such adaptive strategies offer practical benefits.

artificial intelligence, machine learning, optimizer, (16 more...)

arXiv.org Artificial Intelligence

2508.05408

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.66)

Industry: Education > Educational Setting > Online (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

FedDuA: Doubly Adaptive Federated Learning

Takakura, Shokichi, Liew, Seng Pei, Hasegawa, Satoshi

arXiv.org Machine LearningMay-19-2025

Federated learning is a distributed learning framework where clients collaboratively train a global model without sharing their raw data. FedAvg is a popular algorithm for federated learning, but it often suffers from slow convergence due to the heterogeneity of local datasets and anisotropy in the parameter space. In this work, we formalize the central server optimization procedure through the lens of mirror descent and propose a novel framework, called FedDuA, which adaptively selects the global learning rate based on both inter-client and coordinate-wise heterogeneity in the local updates. We prove that our proposed doubly adaptive step-size rule is minimax optimal and provide a convergence analysis for convex objectives. Although the proposed method does not require additional communication or computational cost on clients, extensive numerical experiments show that our proposed framework outperforms baselines in various settings and is robust to the choice of hyperparameters.

artificial intelligence, feddua, machine learning, (15 more...)

arXiv.org Machine Learning

2505.11126

Genre: Research Report (0.50)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Contouring learning rate to optimize neural nets

#artificialintelligenceSep-1-2017, 15:45:14 GMT

Check out Siddha Ganju's talk on embedded deep learning at the Artificial Intelligence Conference in San Francisco, Sept. 17-20, 2017. Learning rate is the rate at which the accumulation of information in a neural network progresses over time. The learning rate determines how quickly (and whether at all) the network reaches the optimum, most conducive location in the network for the specific output desired. In plain Stochastic Gradient Descent (SGD), the learning rate is not related to the shape of the error gradient because a global learning rate is used, which is independent of the error gradient. However, there are many modifications that can be made to the original SGD update rule that relates the learning rate to the magnitude and orientation of the error gradient.

artificial intelligence, learning rate, machine learning, (17 more...)

#artificialintelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.24)

Industry: Transportation (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

[1606.04474] Learning to learn by gradient descent by gradient descent • /r/MachineLearning

@machinelearnbotJun-15-2016, 03:10:13 GMT

One thing, which I'm not sure, is how correct is their comparison. By that I mean that they fix the global learning rate for the "hand designed" algos and choose it by grid search. However, we do know well that in most problems we can start with a larger learning rate an decay it over time after it platoes. The issue of not conisdering that probably the best global learning rate for the whole run, would be one which is very slow, but eventually outperforms faster ones. Nevertheless, this is an interesting work, although I'm still quite skeptical of such optimiziers to generalize well on large models.

artificial intelligence, gradient descent, machine learning, (2 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)

Add feedback

Less Regret via Online Conditioning

Streeter, Matthew, McMahan, H. Brendan

arXiv.org Artificial IntelligenceFeb-25-2010

In the past few years, online algorithms have emerged as state-of-the-art techniques for solving large-scale machine learning problems [2, 13, 16]. In addition to their simplicity and generality, online algorithms are natural choices for problems where new data is constantly arriving and rapid adaptation is imporant. Compared to the study of convex optimization in the batch (offline) setting, the study of online convex optimization is relatively new. In light of this, it is not surprising that performance-improving techniques that are well known and widely used in the batch setting do not yet have online analogues. In particular, convergence rates in the batch setting can often be dramatically improved through the use of preconditioning. Yet, the online convex optimization literature provides no comparable method for improving regret(the online analogue of convergence rates).

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

1002.4862

Genre: Research Report (1.00)

Industry:

Banking & Finance (0.93)
Education (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Tempering Backpropagation Networks: Not All Weights are Created Equal

Schraudolph, Nicol N., Sejnowski, Terrence J.

Neural Information Processing SystemsDec-31-1996

Backpropagation learning algorithms typically collapse the network's structure into a single vector of weight parameters to be optimized. We suggest that their performance may be improved by utilizing the structural information instead of discarding it, and introduce a framework for ''tempering'' each weight accordingly. In the tempering model, activation and error signals are treated as approximately independent random variables. The characteristic scale of weight changes is then matched to that ofthe residuals, allowing structural properties such as a node's fan-in and fan-out to affect the local learning rate and backpropagated error. The model also permits calculation of an upper bound on the global learning rate for batch updates, which in turn leads to different update rules for bias vs. non-bias weights. This approach yields hitherto unparalleled performance on the family relations benchmark, a deep multi-layer network: for both batch learning with momentum and the delta-bar-delta algorithm, convergence at the optimal learning rate is sped up by more than an order of magnitude.

global learning rate, learning rate, tempering backpropagation network, (14 more...)

Neural Information Processing Systems

Country: